
STOP 



Early Journal Content on JSTOR, Free to Anyone in the World 

This article is one of nearly 500,000 scholarly works digitized and made freely available to everyone in 
the world by JSTOR. 

Known as the Early Journal Content, this set of works include research articles, news, letters, and other 
writings published in more than 200 of the oldest leading academic journals. The works date from the 
mid-seventeenth to the early twentieth centuries. 

We encourage people to read and share the Early Journal Content openly and to tell others that this 
resource exists. People may post this content online or redistribute in any way for non-commercial 
purposes. 

Read more about Early Journal Content at http://about.jstor.org/participate-jstor/individuals/early- 
journal-content . 



JSTOR is a digital library of academic journals, books, and primary source objects. JSTOR helps people 
discover, use, and build upon a wide range of content through a powerful research and teaching 
platform, and preserves this content for future generations. JSTOR is part of ITHAKA, a not-for-profit 
organization that also includes Ithaka S+R and Portico. For more information about JSTOR, please 
contact support@jstor.org. 



i602 American Statistical Association. [38 



ON THE VARIATE DIFFERENCE CORRELATION 
METHOD AND CURVE-FITTING. 

By Warren M. Persons, Harvard University. 



(Professors A. A. Young of Cornell University andE. E. Day of Harvard 
University have generously read the manuscript of this paper and have made 
helpful suggestions. Computations and pertinent comments have been made 
by Mr. Edwin Frickey of Colorado College. W. M. P.) 



In Biometrika for April, 1914, "Student" generalized the 
method of differences for the elimination of spurious correlation 
due to order of items in time or space. In the same Journal 
for November, 1914, Dr. 0. Anderson of Petrograd provided 
the probable errors of the successive difference correlations 
of two series where the correlations of random pairs of the 
variates are zero. In the same number B. M. Cave and Karl 
Pearson presented "Numerical Illustrations of the Variate 
Difference Correlation Method," using Italian economic data. 

The conclusion of "Student" is this: "If we wish to elimi- 
nate variability due to position in time or space and to deter- 
mine whether there is any correlation between the residual 
variations, all that has to be done is to correlate the 1st, 
2nd, 3rd. . . nth differences between successive values of 
our variable with the 1st, 2nd, 3rd . . . nth differences 
between successive values of the other variable. When the 
correlation between the two nth differences is equal to that 
between the two (n+l)th differences, this value gives the 
correlation required."* The meaning of "Student" is that 
the correlation required is indicated by the ultimate steadi- 
ness of values of the correlation coefficient for higher multiple 
differences of the items.f 

After working with 11 series of 28 items each Miss Cave and 
Professor Pearson stated :J "In most cases our difference cor- 

* Biometrika, April, 1914, p. 179. 

t Biometrika, November, 1914, footnote p. 340. 

%lbid., pp. 354-355. 



39] Correlation Method and Curve-Fitting. 603 

relations have hardly even with the sixth differences reached a 
steady state. ... In the great bulk of instances there is 
still a more or less steady rising or falling appreciable in the 
difference correlations, and all we can really say is that the 
final value, the true r X Y> will be somewhat greater or less 
than a given number. From an examination of the actual 
numerical working of the correlations, it appears to us that the 
terminal values are in the case of these short series of very 
great importance. It is further clear that the theory as given 
by 'Student' depends upon certain equalities which are not 
fulfilled in practice in short series. We await with much in- 
terest the complete publication of Dr. Anderson's work, and 
hope to find a fuller discussion of the allowance to be made in 
short series for the influence of the terminal state of affairs 
on the steadiness of the series and on the approach to the 
standard deviation formulae. But apart from these lesser 
points, our present numerical investigation has convinced us 
of the very great value of the new method of Variate Differ- 
ence Correlations." 

In the demonstration leading to his theorem, previously 
quoted, "Student" stated that "if x\, a; 2 , x 3 , etc., y h yi, y 3 , etc., 
be corresponding values of the variables x and y, then if 
Xi, X2, x 3 , etc., yi, 2/2, y 3 , etc., are randomly distributed in 
time and space, it is easy to show that the correlation be- 
tween the corresponding nth differences is the same as that 
between x and y."* In the proof of this statement certain 
assumptions were made as follows: first, that there is a signi- 
ficant correlation between items with similar subscripts, i e. 4 
S^jc^ic^O; second, that if the items of each series be paired 
with the preceding items of the same series the correlation 
will be zero, i. e., 2(x K x K+1 )=0 and 2yKlfK+i = Q> third, that 
there is no significant correlation between the items of one 
series and the items of the other when there is a lag in either 
direction, i. e., 2(.XKy K+ i) =0, X(x K+1 y K )=0; fourth, that the 
sum of the items of each series is zero, i. e., 2x K = 0, Sy K =0; 
fifth, that the time element in any series ordered in time can 
be expressed as an algebraic function of time (t) of some de- 
gree, the nth. 

• Biomitiko, April, 1914, p. 179. 



604 American Statistical Association. [40 

It is my contention that these assumptions are such as 
cannot be retained in applying the method to the most, com- 
mon types of problems. For instance the pairing of items of 
two time series is made possible by the position of those items 
in time either because they occur in the same time interval 
(concurrent) or in definitely related intervals (lag). Our 
problem may be, and usually is, not only to determine the cor- 
relation but to find what pairings give the maximum correla- 
tion. In such case the assumption that only one pairing is 
significant vitiates the conclusion at the outset. The writers 
on the variate difference correlation method all assume that 
"the true rxr" is for pairs concurrent in time. 

Let the method of variate differences be applied to two time 
series artificially constructed. Let series A be made up of 
successive values of the function 2t+3 with +1 and —2 alter- 
nately added to the items. Let series B be made up of the 
values of successive values of the function £ 2 +3£+2 with —1 
and +1 alternately added to the items. We will have the 
following series and differences: 

SERIES A. 

t Items IstD 2nd D 3rd D 

4 

1 3 -1 

2 8 +5 +6 

3 7-1-6 -12 

4 12 +5 +6 +12 

5 11 -1 -6 -12 

etc. 

SERIES B. 

t Items IstD 2nd D 3rd D 

1 

1 7 +6 

2 11 +4 -2 

3 21 +10 +6 +8 

4 29 +8 -2 -8 

5 43 +14 +6 +8 

etc. 



41] Correlation Method and Curve-Fitting. 605 

The process of taking differences eliminates the elements 
due respectively to the first and second degree functions of 
time. The oscillating elements remain. If concurrent items 
for the third and higher differences are paired, we have 
r™=r % o= . . . r m) = — 1. If the items are paired with a 
lag of one in either direction we have r!i\=r+ 1 =r!! 1 =r+i = 
. . . r ( ™i=r^ = +l.* In this case the time elements 
are eliminated as far as they can be by the method of differ- 
ences and yet the series resulting are not, properly speaking, 
random in time. It will be of interest to determine if this 
sort of oscillation occurs in actual data. 

II. 

I have recently been working with some 21 series of economic 
statistics for the United States for the period 1879 to 1913, giv- 
ing 35 items to each series. Application of the variate differ- 
ence correlation method to such series has forced me to the 
conclusion that neither the possibilities nor the limitations of 
the method, when applied to short series, have been appre- 
ciated by the writers on the subject.! 

The applications of the method by "Student" and by Miss 
Cave and Professor Pearson are not satisfactory tests: first, 
because they have applied it merely to items of the same date, 
thus assuming that the real correlation can exist only for such 
pairs; and, second, they assume that the correlation indi- 
cated by the steadiness of coefficients between higher differ- 
ences only is significant, the coefficient for first differences not 
being significant unless it is supported by steadiness of the 
coefficients of higher differences. My objections to these as- 
sumptions and the conclusions based upon them will be il- 

* In ri the subscript i indicates the number of time intervals that the items of series A precede (— ) or lag 
behind (+) the corresponding items of series B. 

f At the conclusion of his article in Biometrika, April, 1914, pp. 269-279, Mr. 0. Anderson makes the 
following statement, which I translate: 

"If we take into consideration that for our purposes the evolutionary component of a series has disap- 
peared if it becomes so small relative to the oscillatory component that it can influence only the 3rd, 4th, 
■etc., decimal place of the expression for B [the coefficient of correlation] then we may conclude that not 
only components which are represented by a parabola of higher order, but also those represented by tran- 
scendental functions (such as a sine curve) become eliminated taking by a finite number of differences. 
Further it may be shown that generally all more or less ' smooth series' all of which are characterized by a 
considerable degree of positive correlation between adjacent items, lose the character of smoothness in the 
process of multiple differences. The generalized Cave-Hooker procedure is, therefore, manifestly a quite 
universal means of sifting put the correlation between the oscillatory elements of complex series." 



American Statistical Association. [42 

lustrated by application of the variate difference correlation 
method to American data for the period 1879-1913. The 
series used are the following: 

1. Wholesale prices of commodities. 

2. Gross receipts of railroads. 

3. Net earnings of railroads. 

4. Coal production. 

5. Exports from the United States. 

6. Imports into the United States. 

7. Pig-iron production. 

8. Price of pig-iron. 

9. Immigration (fiscal year). 

10. Shares sold on the New York Stock Exchange. 

11. Average price of shares sold on the New York Stock 

Exchange. 

12. New York clearings plus five times outside clearings 

(called clearing index). 

13. Clearing index divided by relative wholesale prices 

(called corrected clearing index). 

14. New railroad mileage constructed. 

15. Per cent, of business failures. 

16. Liabilities of business failures. 

17. Balance of trade. 

18. Weighted index numbers of the yield per acre of nine 

leading crops. 

19. Ratio of loans to resources of banks. 

20. Ratio of cash to deposits of banks. 

21. Surplus reserves of New York associated banks. 

In each case, except for the clearing index, I assumed the 
secular trend to be linear. A straight line was fitted to each 
series by the method of least squares or the method of mo- 
ments. For the clearing index I assumed the secular trend to 
be the compound interest law and the function y—B C 1 , where 
t represents time and B and C are constants determined by the 
data, was fitted to the series. The deviations of the raw fig- 
ures from the lines of secular trend were found and designated 
the "cycles." 



43] 



Correlation Method and Curve*Fitting. 



607 



TABLE I. 

COEFFICIENTS OF CORRELATION FOR THE CYCLES OF THE BUSINESS BAROMETER 
AND THE CYCLES OF WHOLESALE PRICES, GROSS RECEIPTS OF RAILROADS, COR- 
RECTED CLEARING INDEX AND SURPLUS RESERVES OF NEW YORK BANKS TO- 
GETHER WITH COEFFICIENTS FOR MULTIPLE DIFFERENCES, FIRST TO SIXTH, 
WITH VARIOUS DEGREES OF LAG, 1879-1613. 



Business Barometer 
Correlated with. 



Coefficients of Correlation, (a) 



r-$. 



r-a- 



r-i- 



r - 



r +i- 



r+$- 



r+s- 



Wholesale prices, Cycles. 
IstD. . 
2ndD. 
3rd D. . 
4th D. . 
5th D. . 
6th D. . 



Gross Receipts of Railroads 

Cycles 

1st D 

2nd D 

3rd D 

4th D 

5th D 

6th D. . . . 



Corrected Clearing Index 
Cycles. . 
1st D. . 
2nd D. . 
3rd D. . 
4th D. . 
5th D. . 
6th D. . 



Surplus Reserves of New York Banks 

Cycles 

IstD 

2nd D 

3rdD 

4th D 

5th D 

6th D 



+ .12 



+ .25 



-.36 
-.48 



+ .28 



+ .49 
+ .53 



-.46 



+ .90 
+ .20 
-.22 
-.35 
-.40 
-.42 



+ .91 
+ .34 
+ .08 
-.06 
-.18 
-.24 



+ .51 
-.35 
-.55 
-.55 
-.53 
-.53 
-.52 



-.27 
+ .33 
+ .56 
+ .64 
+ .71 
+ .75 
+ .79 



+,95 

+ '78 
+ .78 
+ .78 
+ .76 
+ .74 
+ .71 



+ .94 
+ .80 
+ .74 
+ .74 
+ .76 
+ .75 
+ .74 



+ .77 
+ .32 
+ .23 
+ .21 
+ .21 
+ .24 
+ .28 



-.60 
-.72 
-.74 
-.75 
-.77 
-.78 
-.81 



+ .85 
-.03 
-.46 
-.57 
-.59 
-.58 



+ .79 
-.28 
-.64 
-.71 
-.74 
-.75 



+ .80 
+ .52 
+ .38 
+ .27 
+ .18 
+ .07 
-.05 



-.62 
+ .02 
+ .23 
+ .35 
+ .42 
+ .49 
+ .59 



+ .25 



+ .66 
-.04 
-.29 
-.25 
-.18 
-.07 
+ .05 



20 



09 



+ .09 



.20 



+ .17 



(a) The subscript i of the coefficient of correlation r indicates the years lag (+) or years previous (— ) 
of the Business Barometer. 



Investigation of the 21 series of cycles led to the conclusion 
that the fluctuations of 9 of them synchronize and hence can 
logically be combined into a business barometer. Consequently 
a business barometer is constructed of the 9 series, those 
numbered 1 to 9 in the list given on the preceding page.* 
The coefficients of correlation for the cycles of the business 
barometer and the cycles of wholesale prices, gross receipts of 
railroads, corrected clearing index and surplus reserves of 

* For details of this investigation see the article by the writer in the American Economic Review, Decem- 
ber, 1916. 



American Statistical Association. 



\U 



New York associated banks, together with the coefficients of 
correlation for the first to sixth differences, with various degrees 
of lag for cycles and differences, are given in Table I. Of 
course there is an element of spurious correlation in the coeffi- 
cients for the business barometer, wholesale prices, and gross 
receipts of railroads because the two last named series enter 
into the barometer. That element is not believed to be large 
for the cycles and first differences. 

Examination of Table I reveals: first, high, positive, and 
steady coefficients for concurrent items for wholesale prices 
and gross receipts of railroads, high, negative, and steady 
coefficients for surplus reserves, low, positive, and fairly steady 
coefficients for the corrected clearing index; second, all the 
coefficients for higher differences show a marked tendency to 
alternate in algebraic sign as successive degrees of lag are 
taken in either direction. 



TABLE U. 

PROBABLE ERRORS OF COEFFICIENTS OF CORRELATION FOR SERIES OF 35 AND 46 
ITEMS, RESPECTIVELY, AND FOR THEIR MULTIPLE DIFFERENCES, FIRST TO 
SIXTH.(a) 

(Thirty-five Items in Original Series.) 



Coefficients of 
Correlation. 


Items. 


1st D. 2i 


ldD. 3 


■dD. 4 


thD. 


5th D. 


6th D. 


.90 


.02 


.03 


03 


03 


04 


.04 


.04 


.80 


.04 


.05 


06 


07 


07 


.08 


.08 


.70 


.06 


.07 


08 


09 


10 


.11 


.11 


.60 


.07 


.09 


10 


12 


12 


.13 


.14 


.50 


.09 


.11 


12 


13 


15 


.16 


.16 


.40 


.10 


.12 


14 


15 


16 


.17 


.18 


.30 


.10 


.13 


15 


16 


18 


.19 


.20 


.20 


.11 


.14 


16 


17 


19 


.20 


.21 


.10 


.11 


.14 


16 


18 


19 


.21 


.22 


.00 


;11 


.14 


16 


18 


19 


.21 


.22 



(Forty-six Items in Original Series.) 



.90 


.02 


.02 


.03 


.03 


.03 


03 


.04 


.80 


.04 


.04 


.05 


.06 


.06 


06 


.07 


.70 


.05 


.06 


.07 


.08 


.09 


09 


.10 


.60 


.06 


.08 


.09 


.10 


.11 


11 


.12 


.50 


.08 


.09 


.11 


.12 


.13 


13 


.14 


.40 


.08 


.10 


.12 


.13 


.14 


15 


.16 


.30 


.09 


.11 


.13 


.14 


.15 


16 


.17 


.20 


.10 


.12 


.14 


.15 


.16 


17 


.18 


.10 


.10 


.12 


.14 


.15 


.17 


18 


.19 


.00 


.10 


.12 


.14 


.16 


.17 


18 


.19 



(a) Computed from formulae developed by O. Anderson and A. Ritchie-Scott. See Bimetrika, Novem- 
ber , 1915, p. 136. 



45] Correlation Method and Curve-Fitting, 609 

The significance of the various coefficients depends not only 
on their size but upon the number of items used in the com- 
putation. The probable errors for various coefficients based 
on 35 items in the original series, first to sixth differences are 
given in Table II.* The probable errors for sixth differences 
are, approximately, twice those for the original series. A 
coefficient of, say .45 or more would be significant for the 
original series and of .65 or more for sixth differences. What 
is the explanation of the observed steadiness, and of the 
alternation of sign of coefficients for various degrees of lag? 
"Student" believes that the steadiness is due to the random 
distribution, with respect to time, of the differences. The 
alternation in sign is a phenomenon not noticed, or if noticed 
not considered, by the writers on the subject. 

First, let us consider the phenomenon of alternation in sign. 
Let one of the original series be x , x u x 2 . . . % n -i> the 
first differences being X\— x , x 2 —x u . . . a;„_i— x„_ 2 .t 
Suppose the first differences alternate in sign at any point so 
that we have 

x K —x K -i = +a 

etc. 

where a, b, c . . . are positive numbers. The second 
differences are — b— a, +c+b, — d — c, . . ., a series al- 
ternating in sign and larger numerically than the series from 
which it is derived. For the portion of the original series, 
however, where there is no alternation in sign the first differ- 
ences will be smaller numerically than the items from which 
they are derived. Since the nth differences are derived from 
the (n— l)th differences in the same way that the first differ- 
ences are derived from the original series we have the following 
conclusion: If consecutive items of a series alternate in sign 
the first and higher differences will also alternate in sign and 
the resulting items will increase numerically as the order of 

* These probable errors are computed from theories developed by 0. Anderson', Biomelrika, April, 
1914, p. 269. The formulae for probable errors may hold for a very large number of items but I 
doubt their validity for less than 100 items. 

f The standard notation for successive finite differences is A, A 2 , A*, etc. That notation is not 
used in this paper bacause it would tend to conceal relations brought out in Fart III of this paper 

i 



610 



American Statistical Association. 



[46 



the difference increases. A succession of like signs may persist 
with the first and higher differences but the numbers resulting 
will be smaller numerically than those resulting where the 
sign alternates. Where the variate difference method is ap- 
plied to two short series we may, therefore, expect the terms 
alternating in size to be of dominating influence upon the 
coefficient of correlation. Also when a lag is taken in either 
direction the coefficients will tend to alternate in sign. 

TABLE in. 
PERCENTAGES POUND BY TAKING THE RATIO OF THE NUMBER OP CASES IN WHICH 
SUCCESSIVE ITEMS DIFFER IN SIGN TO THE P0SSD3LE NUMBER OF ALTERNATIONS 
IN SIGN OP SUCCESSIVE ITEMS; VARIOUS SERIES, FIRST TO SIXTH DIFFERENCES, 
1879-1913. 





Percentage of Unlike Signs, (a) 


Series. 


IstD. 2n 


dD. 3r 


dD. 


4th D. 


5th D. 


6th L. 




.55 
.42 
.58 
.33 
.55 
.58 
.42 
.52 
.52 
.55 
.61 
.42 
.58 
.55 
.30 
.55 
.45 
.55 
.48 

.50 


59 
59 
62 
66 
66 
62 
59 
69 
78 
75 
62 
62 
66 
62 
53 
62 
66 
66 
66 

64 


64 
61 
68 
68 
68 
71 
61 
74 
84 
74 
71 
68 
68 
61 
68 
61 
64 
81 
77 

70 


.87 
.63 
.70 
.70 
.73 
.70 
.67 
.83 
.83 
.77 
.77 
.80 
.80 
.60 
.70 
.73 
.70 
.83 
.93 

.75 


.79 
.83 
.69 
.76 
.72 
.79 
.72 
.93 
.86 
.76 
.79 
.83 
.79 
.59 
.83 
.79 
.69 
.83 
.93 

.79 


.79 




.86 




.75 




.82 




.82 




.82 




.79 




.93 




.89 




.86 




.79 




.82 




.82 




.71 




.82 




.79 




.71 




.82 




.93 




.82 







(a) A + or — item followed by a null item or vice versa is counted ^. The possible numbers of unlike 
signs are 33, 32, 31, 30, 29 and 28 for the differences, 1st to 6th. 

Table III shows the tendency of the higher differences of 
the series here under consideration to alternate in sign. With 
34 items in the series of first differences there are 33 possible 
alternations in sign when each term is compared with the 
preceding term; there are also 33 possible cases of steadiness 
in sign. Counting the number of times that the signs of suc- 
cessive terms of each series alternate and expressing the num- 
ber as a percentage of the possible number (32 for second 
differences, 31 for third differences, etc.) we have the per- 
centages appearing in the table. For numbers chosen at 



47] 



Correlation Method and Curve-Fitting. 



611 








+ 


co»ra<or~ 




t^OtMCO 












f r r r 




+ r f f 




++++ 






. 


oooo 

oooo 


d 


<Dca-*oo 

(DffitDO 




t^e<*<©0Q 
■**■"#» 




«? 


++++ 


++++ 


f i" r i* 






1 


CS^lOi-l 




<N©i-KN 




C<»i-iOO 






1 


r i* r r 




+ r f r 




r r r+ 






+ 


M iO<Ot- 




weokocq 




N«000 
eqnoo 






f r r i' 




++++ 




r i '+ 








oooo 

OOOO 






O 


■* lOlO >o 




s? 


++++ 


§ 


r f r r 


i' r r i' 






*-t 


COlO<Ot>; 




CO>Q<Db> 




I-HOXN-* 

* m ©to 






1 


r f r r 




++++' 




++++ 






+ 


IOCS'*-* 
OCO"* »o 




<M00t--CO 
lOKKN^-l 




O-OHtM 






+ r r f 




++++ 




++++ 




d 




OOOO 

OOOO 




■ <MCS«-««-« 

cocqcqesi 


a 
o 


OOOO 




t? 


++■+■+ 




++++' 


+ 1' i r 






A 


IOCS"*-* 

ocO"*wb 




m io jo eo 










1 


+ r i' f 




i' r i* f 




i' r i - r 






+ 


+ .24 
-.43 
-.64 
-.71 




OQ •*i-H'* 

l' f l' f 




-.21 
+ .03 

+ .15 
+ .23 








oooo 
oooo 


d 


©•*■*«• 
ooc^t-t-- 




OMCOIO 




J? 


++++ 


++++ 


f f l' l' 






A 


■*»■*«-• 




-* 00 ©00 
MO Oi-H 




OSIO^H ■* 
.-I^JtlOiO 






1 


+ l' l' \ 




++ 1" l' 




++++ 






• ^ 


ssss 




©•* us*; 




©I—HO-* 

CD CO CO to 






+ 


f r r f 




i" r f i 




++++ 




4 




SSSS 


9 


00 00 CO <o 


d 
m 


©weoeo 




i 
*- 


++++ 


++++ 


+ 1' f \ 






A 


25SS 




OC4>QO 




Ot^MHOp 






1 


r r i' i 




+1*1*1 




l' l' l" l' 






o 

a 




































I 

a 


































1 




1 

5 


J 


1 


1 


4 


1 


t: 
c 
J 




t 


. 


1 




I 


J 







•sgss 

s 8 » M 
8§S8g 

-« g g-S 



"£,+= to* 

iggf 

IH.& 

S1JJ 

N|! 

■» £ ►•o 

■S5«.S 



3E|S 



« a » S 
S £ -* 

!lf! 




612 



American Statistical Association. 



[48 



random we should expect the first differences to have 50 per 
cent, steadiness in sign and 50 per cent, alternation, on the 
average. We get exactly 50 per cent, as the average for the 
series listed. For second and higher differences the average 
percentages of alternation are: 64, 70, 75, 79, 82. A marked 
tendency to alternation in sign is revealed. The cumulative 
effect of this tendency to alternate in sign as higher differences 
are taken is also revealed in Table IV. Where the series A, 
B, C, D, and E are correlated with themselves (AA, BB, CC, 
DD, EE) there is.a numerically increasing but negative coeffi- 
cient for a lag of one item. Where all possible combinations 
of the five series are taken the coefficients alternate in sign 
or show a strong tendency to do so with the higher differences. 

TABLE V. 

COEFFICIENTS OF CORRELATION BETWEEN TWO RANDOM SERIES OF 35 ITEMS EACH 
AND BETWEEN THEIR MULTIPLE DIFFERENCES, FIRST TO EIGHT, CONCURRENT 
AND A LAG OF ONE AND OF TWO ITEMS IN EITHER DIRECTION. 





Coefficients of Correlation. 


Items Paired. 


r-t- 


r-i- 


To- 


r+i- 


r+ t . 




+ .32 
+ .32 
+ .30 
+ .28 
+ .28 
+ .31 
+ 31 
+ .31 
+ .30 


+ .08 
-.03 
-.15 
-.24 
-.28 
-.32 
-.34 
-.36 
-.36 


-.11 
+ .04 
+ .12 
+ .21 
+ .23 
+ .27 
+ .29 
+ .32 
+ .34 


-.42 
-.29 
-.21 
-.19 
-.20 
-.18 
-.21 
-.26 
-.30 


-.05 




+ .01 




+ .02 




+ .04 




+ .07 




+ .10 




+ .13 




+ .18 




+ .22 







This theory of the tendency of signs of terms of higher 
differences to alternate and, therefore, to affect the coeffi- 
cients of correlation was tested by applying the method of 
variate differences to two random series of 35 items each. 
The method of selection of the numbers was this: the pages 
of a table of six-place logarithms were turned at random, the 
tip of a pointer was placed at random on the page and the 
two digits at the right of the logarithm indicated by the 
pointer were taken as the items of the series. The coefficients 
of correlation between the two random series of 35 items and 
between their multiple differences, first to eighth, concurrent 



49] Correlation Method and Curve-Fitting. 613 

and for one and two items lag in each direction are given in 
Table V. For the first and higher differences there is a per- 
sistent alternation in sign as we take a lag in either direction.* 
The coefficients alternate in sign, of course, because, first, the 
two series correlated alternate in sign, second, the terms 
alternating in sign become the dominating ones when the 
products of corresponding items are taken and third, lagging 
either series in either direction will bring a different set of 
signs into correspondence. The term "Lxy tends to alternate 
in sign and hence r does. 

III. 

Does this phenomenon of the alternation in sign of the 
coefficients, left to right in the tables, have any bearing on the 
steadiness or unsteadiness of the coefficients, in the tabular 
columns, based on successive differences but with the same 
lag throughout? This question will now be considered. 

The series of nth differences is derived from the series of 
(n— l)th differences by the same process that the series of first 
differences is derived from the original figures. Therefore, 
the expression for the coefficient of correlation for nth differ- 
ence is the same function of the (n— l)th differences that the 
first differences is of the original series. 

Let rj-"* represent the coefficient of correlation between the 
mth differences of the series x„ x it ... x n -i and y B , y lt 
. . . j/„_! where the subscript L denotes the lag of the 
x series. When L=?l we have the pairs x x y , x 2 y\ . . • 
#n-i y n -2'; when L= — 1 we have the pairs x y x , Xiy 2 • • . 
x n -2 y„-i, etc. The formula for the coefficient of correlation 
between concurrent items of the original series is usually 
written in the form 

K=n-l 

2 (x K -x)(y K -y) 
r- , K= ° (1) 



lK=n-l K=n-1 

V 2 (x K -x)* S (y K - 

K=o Kp=o 



w 



when x and y are arithmetic averages of the respective series. 

* None of the coefficients found signify appreciable correlation between the series. In but one case is 
the coefficient more than three times its probable error and in the majority of cases the probable error ia 
approximately the same as the coefficient". 



614 American Statistical Association. [50 

The function r may be written 

n— 1 n—1 n—1 

n 2 x K y K - 2x K 2y K . 
r = — " oo — ,~ 

» I n-1 n-1 n-1 n-1 K ' 

. V [» S «4-( 2 a*) 2 ][n 2^-(S te ) 2 ] 



The coefficient of correlation r' between concurrent first dif- 
ferences xi-x , yi-y ; x2-x 1} y 2 -yi; ■ • ■ x n -. x -x n _i, 

n-1 

y n -i y n -2 is, noticing that 2 (x K — x K - 1 )=x n - 1 — x and 

i 
»-i 

2 (yK—VK-i) = y n -i~yo, in terms of the original items. 
l 

n-l 

(n-1) 2 (x K -x K - 1 )(y K -y K - 1 )-(x n _ 1 -x )(y n - 1 -y ) 

1 == (3) 

[(fl-l)"z {XK-XK^Y-ix^x-XoYl 

V[(«-l)V(te-to-i) 2 -(!/n-r!/»)']} 

Assuming x n ^i—x and y n -\—y to be negligible in compar- 
ison with other terms of the function (called assumption a) 
we will discard the former. The function becomes 

n—1 . n— 1 n—1 

j_22 Stfj/x-foj/o+a^-ii/n-i) - ( 2 a; J5: _ 1 y x + 2 Xjrf/x-i) 

r «"" « i i m 

[22 a£- (zf+zVi) -2*2 afrafc-J. 

1 

V [2? y K - (yt+yl-i) -2Sm k -i] 

1 

Assuming that the original items of each series are random in 
order, first with respect to the adjacent items of the same series 
and, second, with respect to the items adjacent to the con- 
current item of the other series, that is assuming 

n—1 n—1 n—1 n—1 

2 x K -iy K) 2 x K y K -i, 2 x K x K - U and 2 y K VK-i all equal zero 
ill i 

(called assumption b) we have 

n-l 

2 2 x Jl y K -(x y +x n - l y nr .i) 

r 'o = / n-1 .n-1 ( 5 > 

V [2 2 x\-{xl+xl-i))[2 2 yl- (rf+fi-i)] 



w; 



51] Correlation Method and Curve-Fitting. 615 

Assuming a;<>2/<,+a; n -iJ/»-i, a^+zi-i and y 2 +yl-i to be neg- 
ligible in comparison with other terms in the function (called 
assumption c) we have, discarding the three terms named, 
»-i 

/„= — ,_ 1 n J f= =r , approximately, if x=y = o 

o 

That is, if at any time assumptions a, b, and c hold true for 
any series of multiple differences, the coefficient of correlation 
for their first differences will equal the coefficient for the items 
from which the first differences were derived. If the same 
assumptions (a, b, and c) hold true for the series of first differ- 
ences then r =r' . Further if the assumptions hold true re- 
peatedly for successive differences from and after the pth, 
we have (m>p) 

r (m) _ j.(»»-i) _ _ r (p+i) _ r 0>) 

It is obvious then, that the coefficients of correlation for suc- 
cessive differences will remain stable if assumptions a, b, and 
c hold true for successive differences. The condition (assump- 
tion that a, b, and c hold true repeatedly) is sufficient to pro- 
duce stability, but is it necessary? 
Consider form (4) of the function r . If the two terms, 

n— 1 n— 1 

2 x K x K _i and 2 y^K-v appearing in the denominator are 
i i 

«— 1 n—X 

negative in sign, if the expression ( S x K -iy K + S x K y K -i) is 

i l 

n-l 

opposite in sign to S XRy K , and if the terms are of appropri- 

l 
ate size as well as sign, if these assumptions hold true, 

n— 1 n— 1 ' n— 1 n— 1 

dropping S x&k-i, 2 ygy K -i, and ( 2 x K ^y K + 2 a^x-i) 

i i t i t i 

will not affect the value of r . Also r will approximately 
equal r . In other words the fulfilling of the assumptions 
named will result in stable coefficients for successive differ- 
ences. It is my contention that for a moderate number of 
items (n = 35 or 40) the conditions here specified are apt to 
occur and be the cause of any stability of the coefficients of 
correlation between multiple differences. 



616 



American Statistical Association. 



[52 



TABLE VI. 

COEFFICIENTS OF COERELATION BETWEEN THE BUSINESS BAROMETER AND THE 
CLEARING INDEX FOR THE UNITED STATES, FIRST TO SIXTH DIFFERENCES, 
CONCURRENT AND LAG AT ONE AND TWO ITEMS IN EITHER DIRECTION, 1879- 
1913. 





Coefficients of Correlation. 


Difference. 


r-i. 


To- 


r+i- 


r+s- 


First 


-.33 

-.65 
-.71 
-.72 
-.71 
-.71 


+ .68 
+ .59 
+ .57 
+ .57 
+ .57 
+ .58 


+ .39 
+ .17 
+.02 
-.07 
-.14 
-.25 


-.22 




-.35 


Third 


-.26 


Fourth 


-.19 


Fifth 


-.06 


Sixth 


+ .11 







The results of applying the variate difference correlation 
method to the business barometer and the clearing index 
(not corrected for prices) are tabulated in Table VI. The 
first to sixth differences were used and the items were lagged 
in both directions. The stability of the coefficients for con- 
current items second to sixth differences (r", r " r™, rj, r") 
and the instability of the coefficients for one item lag in the 
business barometer (r + \ to r+j) are noticeable. Let us investi- 
gate the stability of r" and r% and the instability of r+i and 
r +1 . Using from (4) we have these values: 



Term. 


Hi 




-? 


r +l 


v 
r +l 


22x K y K 


+3134 


+2718 


+ 888 


- 950 


(*^,+* fl -l»»-l) 


- 10 


- 10 


- 9 


+ 12 


&*k-iVk+**&k-J " 


-1293 


- 1618 


+ 681 


+ 2506 


2S4 


+7918 


+ 2126 


+7676 


+ 6128 


(*l+*l-l> 


+ 122 


+ 58 


+ 122 


+ 37 


22x K x K _ 1 = 


-3132 


- 1276 


-2890 


- 4254 


2Xy 2 K 


+3596 


+10536 


+3498 


+32814 


(yl+yl-i) 


+ 50 


+ 250 


5 


+ 1360 


^VrVk-i 


-1722 


- 6298 


-1694 


-21556 



53] 



Correlation Method and Curve-Fitting. 



617 



Computing r"', r" and r+i and r+j in two ways, by form (4) 
and form (5), we have the following results: 





Form (4). 


Form (5). 


c 


+ .59 


+ .59 


'? 


+ .58 


+ .59 


'+\ 


+ .03 


+ .17 


'+1 


-.15 


-.07 



The stability of r"' and r" is explainable by the balancing effect 
of items (2xK rl y K +'2x K y K -. l ), ?<%k%k-i and ^VrVk-i, the in- 
stability of r+i and r +1 is explainable by the lack of balance 
between those items as they appear in the function. The 
items named are not approximately zero, in the case of 
r l " and r", as "Student" assumes, but numerically as large as 
VxkVk- 

table vii. 

coefficients of correlation between concurrent items of business 
barometer and prices, railroad gross earnings, corrected clearing 
index and surplus reserves of new york banks computed (i) by exact 
formula, (ii) by discarding x(x k -x r _ 1 ) and ^{y k -v k _^ and (iii) by 

DISCARDING ^X K _ l y K -^X R y K _ v ^X R X K _ l AND Sv K J/ K _ 1 FROM FOR- 
MULA. 





Prices. 


R. R. Gross Ear. 


Corr. CI. Index. 


Surplus Reserves. 




I 


11(a) 


IH(b) 


I 


IKa) 


IH(b) 


I 


IKa) 


IIKb) 


I 


IKa) 


111(b) 




+ .95 
+ .78 
+ .78 
+ .78 
+ .76 
+ .74 
+ .71 


+ .94 
+ .78 
+ .78 
+ .78 
+ .76 
+ .74 
+ .71 


+ !i>4 
+ .79 
+ .79 
+.7S 

+.7e 

+ .73 


+ .94 
+ .80 
+ .74 
+ .74 
+ .76 
+ .75 
+ .74 


+ .94 
+ .80 
+ .74 
+ .74 
+ .76 
+ .75 
+ .74 


+ i94 
+ .81 
+ .77 
+.76 
+ .76 
+ .74 


+ .77 
+ .32 
+ .23 
+ .21 
+ .21 
+ .24 
+ .28 


+ .76 
+ .32 
+ .23 
+ .21 
+.21 
+ .24 
+ .28 


+ !76 
+ 34 
+ .23 
+ .22 
+ .22 
+ .26 


-.60 
-.72 
-.74 
-.75 
-.77 
-.78 
-.81 


-.60 
-.71 
-.74 
-.75 
-.77 
-.78 
-.81 




IstD 


-.60 


2ndD., 


— .72 


3rdD 

4th D 


-.74 
-.75 


5th D 


-.77 


6th D 


— .79 







(a) The results in columns II coincide with those in columns I because the sums of the variate differ- 
ences are approximately zero. See Table VIII. 

(b) The results in columns HI are equal to those of columns II of a lower order of difference. Approxi- 
mate steadiness is reached not because the terms discarded are approximately zero but because they " bal- 
ance." Evidence that the terms discarded are large is given by the coefficients of correlation for a lag ' 
in Table IV. 



618 



American Statistical Association. 



[54 



Table VII gives the results of applying, (I) the exact for- 
mula, (II) the formula obtained by assuming 2(x K — #k-i) =0, 
2(vk— Vk-i) =0 and (III) the formula obtained by assuming 
that the terms CSx^-iVk - Sa^j/x-i)? 2%k%k-i aiid ZyKVic-i 
balance and therefore may be discarded, of applying these 
formulae to the business barometer and each of the series, 
wholesale prices, railroad gross earnings, corrected clearing 
index, and surplus reserves of New York associated banks. 
Form II gives coefficients practically identical with those 
resulting from the exact Form I showing that the assump- 
tion involved can safely be made. Table VIII illustrates the 
negligibility of the algebraic sums by comparing the algebraic 
sums of the second differences of various series with the abso- 
lute sums of the same items. Form III of Table VII gives 
coefficients which are equal to those for preceding differences 
as would be expected. 

TABLE VHI. 

ALGEBRAIC SUMS AND ABSOLUTE SUMS OF THE SECOND DIFFERENCES OF SIX SERIES 
WITH RATIO OF FORMER TO LATTER. 



Second Differences of: 



Algebraic 
Sum. 



Absolute 
Sum. 



Ratio. 



Business barometer 

Wholesale prices 

Railroad gross earnings 

Clearing index 

Corrected clearing index 

Surplus reserves of New York banks 





-11 

+22 

-7 

-6 

+2 



150 
157 
312 
277 
456 
568 



.00 
-.07 
+ .07 
-.03 
-.01 

.00 



Consideration of the effects that assumptions a, b, and c 
have upon the value of the coefficients for multiple differences, 
leads to the conclusion that assumptions a and c are in accord- 
ance with the facts and that making them will not, for series 
of 35 or more items, in general affect the coefficient of correla- 
tion by more than .01. Assumption b, however, does not 
usually hold true. Nevertheless, stability is frequently 
secured by a balance between the terms appearing in numerator 
and denominator. Instability of the coefficients is explainable 
by lack of such balance. The fulfilling of assumption c is 
sufficient but not necessary to secure stability. Moreover, 
stability is usually secured when assumption c does not hold 



55] 



Correlation Method and Curve-Fitting. 



619 



true. In view- of this conclusion, what significance has the 
variate difference correlation method for short economic time 
series? This question will now be considered. 

IV. 

The items of annual time series of economic data may be 
•conceived to be constituted of the following elements or com- 
ponent parts: 

First, the secular trend or growth element due to the in- 
crease of population and development of industry; 

Second, cyclical fluctuations, extending over a number of 
years and having a greater or less degree of periodicity, due 
to the alternating periods of business prosperity and depres- 
sion; 

Third, irregular fluctuations from year to year due to the 
influence of accidental or, at any rate, unpredictable events 
such as inventions, striking changes in fashion, or war. 

F I C U R E- 1 




A— REPRESENTS ECONOMIC DATA. 

B— REPRESENTS CYCLICAL FLUCTUATION. 

C— REPRESENTS SECULAR TREND. 



620 American Statistical Association. [56 

An idealistic representation of a time series of economic 
data containing the three elements, secular trend, cyclical, 
and irregular fluctuation is given in Figure 1. For particular 
data the secular trend may, of course, be other than a straight 
line and the cyclical fluctuations other than the simple sine 
curve. 

The concept of secular trend or normal growth element for 
which I contend is that of an element which increases or de- 
creases regularly according to some principle. That principle 
may be linear, i. e., the addition of a constant amount in each 
time interval as assumed in Figure 1, or it may be the com- 
pound interest law, the addition of an equal percentage in 
each time interval, or it may be a second degree parabola, 
or some other law. In any case, however, the mathematical 
function assumed to represent the secular trend should not 
"fit" the cyclical or irregular fluctuations of the data. Now 
the process of taking multiple differences is equivalent, on the 
assumption that the series is an algebraic function of time, 
to reducing the degree of the function of t by one for each 
difference taken. Thus, if the secular trend be linear, if a 
straight line should be fitted to the data and if the deviations 
of the original series from the corresponding ordinates of the 
straight line are taken (the cycles), then the coefficient of 
correlation between the first differences of the original items 
and that between the first differences of the cycles are identi- 
cal. This theorem may be proven as follows: 
Jiet x 0) xi, x 2 . . . x n ..i be the original series of n items 
and let x=mt+b be the line of secular trend. The first differ- 
ences are of the form x K — x K ^i. The cycles are of the form 
x K —(mt K +b). The first differences of the cycles are of the 
form x K —x K ^i~m(t K —t K ^i). Since t K —t K -i is a constant 
for all values of K (being the time interval for which items 
are. taken) the first differences of the cycles differ from the 
first differences of the original series by a constant. Hence 
the coefficients of correlation obtained from them are iden- 
tical. 

The coefficient of correlation between second differences is 
identical with that found between the corresponding devia- 
tions of items from the smoothed curve obtained by taking 



57] Correlation Method and Curve-Fitting. 621 

three-year averages.* This theorem may be proven as fol- 
lows: 

Let x , Xi, x 2 . . . x n _i be the original series. The 
second differences are of the form x K+1 — 2x k +xk-i- The 
deviation of any items from the three-year average is 
x^+Xk+x^ Qr -x E+1 +2x K -x K . 1 Theformjust 
3 3 

given is obtainable from the form for second differences by 
multiplying by — f. Therefore the coefficients of correlation 
between second differences are the same as those between 
deviations from three-year averages.f 

* Moore has correlated deviations from three-year averages of crop yield per acre (weighted index 
of nine leading crops) and production of pig-iron. He does not appear to realise that his coefficients 
are the same as those for second differences. The coefficient for crops and pig-iron production with 
a one-year lag of the latter ( r _i_i) for the period 1870-1911 is .254; for crops and geaeral prices 
for the same period the coefficient \J ) is .303. See Moore s Economic Cycles, pp. 107, 118. 

t (Theorem for form of mth difference! 
Theorem: The mth difference of the terms of a series 

%\> %2* X Z ' ' ' X n 
is of the form 

X K = l ( ' m x K+m~ C m-l x K+m-l'^ C m-2 x K+ m -2~ • ■ • +( — 1 )"' C 'o%] 

where C$ indicates the number of combinations of m things taken i at a time. 

The first differences of the series 

X v X 2 . . . X n aeX 2 —XpX s —X 2 . . . Xj^^—Xj^ . . . x n —x n _ v 

The second differences are 

H-^Xz+Xi . . . x K+1 -~2x K +x K _ 1 . .. . x n —2x n _ 1 +x n _ 2 . 

Assume the mth difference to be of the form 



X 



-™*K+ m -l + r ^o J1 *K+ m -2- • • • +(-ir*tf»d 



K+m~ "™K+m 



|2 



x K+m+l~ mx K+m+ Trj x K+m-l~ • ' ' + (—V m x K+l 



Taking the difference we have 

r _/ m4 .iu , m(OT-l)-gm 

x K+m+l ( m + L > x K+m+ To x K+m-l 



L 2 

I m 



.1. 



\i \m—i \i — 1 I [to — 1+1 J x K+m-i+i 



Consider the coefficient of the (t-f-l)th term: 

| TO | TO 



, •== + = .which reduces to 

\l\m — I | % — \ | TO— 1 + \ 



]TO + 1 _(7»»-r-l 
| i | m+X—i 

Since we have shown that the 2nd differences are expressions with the binomial coefficients corresponding: 
to the order of difference and since, assuming that the mth difference appears with the binomial coeffi- 



i ot [~ (to— t+i)+n _ 

— [_\i \m-i+l j 



622 American Statistical Association. [58 

These theoretical considerations taken in connection with 
applications and tests of the variate difference method lead 
me to the following conclusions concerning the meaning of the 
coefficients of correlation for the raw figures and their multiple 
differences: 

In general, significant coefficients of correlation for the raw 
figures of two time series indicate the similarity of the growth 
elements of the two series, if large growth elements exist. The 
existence or non-existence of such elements is readily deter- 
mined graphically or by fitting a simple function to the data. 

Significant coefficients of correlation for first differences 
indicate that the cyclical fluctuations synchronize, if there be 
cyclical fluctuations. Evidence of such cycles may be secured 
by plotting the deviations from the assumed secular trend. 

Significant coefficients of correlation for second and in some 
cases higher differences indicate, in general, that the irregu- 
lar fluctuations synchronize. Coefficients for higher differences 
of short series contain a large spurious element which increases 
with the order of difference. This element is due to the 
tendency of the items to alternate in sign. 

These conclusions assume that the magnitude of the elements 
due to secular trend, business cycles, and irregular events 
vary in the order in which these elements are named. In 
case we are dealing with data having marked cycles (of the 
same variety) and are interested in the correlation of the cycles 
the coefficients for 'the cycles and first differences constitute 
the proper basis for judgment rather than coefficients for 
higher differences. Stability of coefficients for higher differ- 
ences, in such a case, probably means that the influence of all 
the fluctuations except the irregular ones, i. e., the oscillations, 
has been eliminated by the variate difference process. 

cients corresponding to the terms of the expansion of (l+l) ra , we have proven that the (m+l)th 
difference has as coefficients the terms of the expansion of (1+1)" , + 1 then, by mathematical induction, 
the propositipn stated is true. 

Professor A. A. Young has called my attention to the fact that the foregoing t&torem is demon- 
strated in the Institute of Actuaries' Text Book, 2nd ed., Part II, p. 427. 



59] 



Correlation Method and Curve-Fitting. 



623 



TABLE IX. 

LONDON CLEARINGS (COLUMN 1) WITH THE 9-YEAR MOVING AVERAGE (2) AND DEVIA- 
TIONS FROM MOVING AVERAGE (3) STRAIGHT LINE (4), PARABOLA (5) AND COM- 
POUND INTEREST LAW (6); SAUERBECK'S PRICES (7) WITH THE 9-YEAR MOVING 
AVERAGE (8) AND DEVIATIONS FROM MOVING AVERAGE (9), STRAIGHT LINE (10) 
AND PARABOLA (11); 1868-1913. ALSO DEVIATIONS OF LONDON CLEARINGS FROM 
THE TWO STRAIGHT LINES FITTED TO DATA FOR 1868-1896 AND FOR 1897-1913 
(12) AND DEVIATIONS OF SAUERBECK'S PRICES FROM TWO STRAIGHT LINES 
SIMILARLY FOUND (13). 



Year. 


1.00 


2. 


3. 


4. 


5. 


6. 


7.(b) 


8. 


9. 


10. 


11. 


12. 


13. 


1868 


34 






+ 6 


-18 


- 3 


99 






+ 8 


-fl 


-10 


- 7 


1869 


36 






+ 5 


-15 


- 2 


98 






+ 8 


-10 


- 9 


- 6 


1870 


39 






+ 6 


-11 


- 1 


96 






+ 6 


- 9 


- 7 


- 6 


1871 


48 






+13 


- 2 




>- 7 


100 






+11 


- 2 


+ 1 


- 1 


1872 


59 


"49 


+io 


+21 


+10 




^17 


109 


"ioi 


+ 8 


+21 


+10 


+11 


+10 


1873 


61 


51 


+10 


+21 


+12 


- 


-17 


111 


100 


+11 


+23 


+14 


+12 


+13 


1874 


.59 


53 


+ 6 


+17 


+10 


- 


-15 


102 


99 


+ 3 


+15 


+ 7 


+ 9 


+ 6 


1875 


57 


54 


+ 3 


+12 


+ 9 


- 


-11 


96 


97 


- 1 


+10 


+ 4 


+ « 


+ 2 


1876 


50 


55 


- 5 


+ 3 


+ 1 


- 


1-2 


95 


96 


- 1 


+ 9 


+ 5 


- 2 


+ 2 


1877 


50 


55 


- 5 


+ 1 


+ 1 


+ 1 


94 


93 


+ 1 


+ 9 


+ 6 


- 3 


+ 3 


1878 


50 


56 


- 6 


- 1 


+ 1 


- 1 


87 


90 


- 3 


+ 3 


+ 1 


- 4 


- 2 


1879 


49 


56 


- 7 


- 5 


- 1 


- 3 


83 


88 


- 5 


- 1 


- 2 


- 7 


- 5 


1880 


58 


56 


+ 2 


+ 2 


+ 8 


+ 4 


88 


86 


+ 2 


+ 5 


+ 5 


+ 1 


+ 2 


1881 


64 


56 


+ 8 


+ 6 


+14 


+ 8 


85 


83 


+ 2 


+ 2 


+ 4 


+ 6 


+ 1 


1882 


62 


57 


+ 5 


+ 1 


+10 


+ 5 


84 


81 


+ 3 


+ 2 


+ 4 


+ 3 


+ 1 


1883 


59 


58 


+ 1 


- 4 


+ 6 





82 


79 


+ 3 


+ 1 


+ 4 


- 1 


+ 1 


1884 


58 


61 


- 3 


- 7 


+ 3 


- 3 


76 


77 


- 1 


- 5 


- 1 


- 3 


- 3 


1885 


55 


62 


- 7 


-13 


- 1 


- 8 


72 


75 


- 3 


- 8 


- 4 


- 7 


- 6 


1886 


59 


64 


- 5 


-11 


+ 2 


- 6 


69 


74 


- 5 


-10 


- 5 


- 4 


- 7 


1887 


61 


65 


- 4 


-11 


+ 2 


- 6 


68 


73 


- 5 


-11 


- 5 


- 3 


- 7 


1888 


69 


65 


+ 4 


- 6 


+ 8 


+ 1 


70 


71 


- 1 


- 8 


- 2 


+ 4 


- 3 


1889 


76 


66 


+10 


- 1 


+13 


+ 5 


72 


70 


+ 2 


- 6 


+ 1 


+10 


+ 1 


1890 


78 


67 


+11 


- 1 


+13 


+ 5 


72 


69 


+ 3 


- 5 


+ 1 


+10 


+ 2 


1891 


69 


69 





-13 


+ 2 


- 7 


72 


68 


+ 4 


- 4 


+ 2 





+ 4 


1892 


65 


71 


- 6 


-19 


- 5 


-13 


68 


68 





- 8 


- 2 


- 5 


+ 2 


1893 


65 


71 


- 6 


-21 


- 7 


-15 


68 


67 


+ 1 


- 7 


- 1 


- 6 


+ 3 


1894 


63 


72 


- 9 


-25 


-12 


-19 


63 


66 


- 3 


-11 


- 6 


- 9 





1895 


76 


73 


+ 3 


-15 


- 2 


- 9 


62 


65 


- 3 


-12 


- 7 


+ 3 


+ 1 


1896 


76 
75 


76 
79 




- 4 


-17 
-20 


- 5 

'- 9 


-12 
-16 


61 

62 


66 

66 


- 5 

- 4 


-12 
-10 


- 8 

- 7 


+ 2 


+ 1 


1897 





- 2 


1898 


81 


83 


- 2 


-17 


- 7 


-12 


64 


66 


- 2 


- 8 


- 5 


+ 1 


- 2 


1899 


92 


87 


+ 5 


- 8 


+ 1 


- 5 


68 


67 


+ 1 


- 3 


- 1 


+ 7 


+ 1 


1900 


90 


91 


- 1 


-12 


- 5 


-10 


75 


68 


+ 7 


+ 4 


+ 6 


- 1 


+ 7 


1901 


96 


96 





- 9 


- 2 


- 7 


70 


69 


+ 1 





+ 1 





+ 1 


1902 


100 


102 


- 2 


- 7 


- 3 


- 5 


69 


70 


- 1 





- 1 


- 1 


- 1 


1903 


101 


107 


- 6 


- 8 


- 6 


- 8 


69 


72 


- 3 





- 1 


- 6 


- 2 


1904 


106 


110 


- 4 


— 6 


- 5 


- 7 


70 


73 


- 3 


+ 2 


- 1 


- 6 


- 2 


1905 


123 


115 


+ 8 


+ 9 


+ 8 


+ 7 


72 


73 


- 1 


+ 5 





+ 6 


- 2 


1906 


127 


121 


+ 6 


+11 


+ 7 


+ 8 


77 


74 


+ 3 


+10 


+ 4 


+ 5 


+ 2 


1907 


127 


126 


+ 1 


+ 9 


+ 2 


+ 4 


80 


75 


+ 5 


+14 


+ 6 


- 1 


+ 4 


1908 


121 


132 


-11 





- 8 


- 6 


73 


76 


- 3 


+ 7 


- 2 


-12 


- 4 


1909 


135 


139 


- 4 


+12 


+ 1 


+ 5 


74 


78 


- 4 


+ 9 


- 2 


- 3 


- 4 


1910 


147 






+22 


+ 8 


+12 


78 






+14 


+ 1 


+ 3 


- 1 


1911 


146 






+18 


+ 1 


+ 7 


80 






+16 


+ 1 


- 3 


- 1 


1912 


160 






+30 


+10 


+16 


85 






+22 


+ 5 


+ 6 


+ 3 


1913 


164 






+32 


+ 8 


+17 


85 






+23 


+ 3 


+ 4 


+ 2 



(a) In £100,000,000. (b) Relative Indices. 



Judgment concerning the correlation of cyclical fluctuations 
of two series must be preceded by elimination of the secular 
trend. The choice of a function to represent the secular trend, 
indeed the choice of the method of eliminating the trend, 



624 



American Statistical Association. 



[60 



whether by curve fitting or otherwise, these are questions 
fundamental to the process. I will test the effect that various 
suppositions concerning the secular trend have upon the corre- 
lation coefficients between the deviations from the various 
trends (cycles) resulting from the suppositions.. The two 
series chosen for this test are London clearings and Sauer- 
beck's price indices given with other data in Table IX. These 
series are chosen, first, because the secular trends are dis- 
similar, second, because the trends differ most widely from 
the linear of any which could be found, and, third, because 
the variate difference correlation coefficients for these series 
are puzzling to the author of the variate difference correla- 
tion method. "Student" applied his method to London 
clearings per capita and Sauerbeck's prices and to marriage 
rate and wages, finding the following coefficients: 



I. CLEARINGS AND PRICES. 



Raw Figures. 


IstD. 


2ndD. 


3rdD. 


4th D. 


5th D. 


6th D. 


-.33 


+ .51 


+ .30 


+ .07 


+ .11 


+ .05 




II. MARRIAGE RATE -AND WAGES. 


-.52 


+ .67 


+ .58 


+ .52 


+ .55 


+ .58 


+ 55 



He says, "The difference between I and II is very marked, 
and would seem to indicate that the causal connection between 
index numbers and Bankers' clearing house rates is not al- 
together of the same kind as that between marriage rate and 
wages, though all four variables are commonly taken as in- 
dications of the short period trade wave."* 

Figure 2 presents Sauerbeck's indices with linear and para 
bolic secular trends, the functions being fitted to the data 
by the method of moments. Figure 3 presents London bank 
clearings with linear, parabolic, and exponetial functions 
fitted to the data. Figure 4 presents both series with their 
respective nine-year moving averages, nine years being deter- 

• Biometrika, April, 1914, p. 180. 



61] 



Correlation Method and Curve-Fitting. 



625 



mined by inspection as the length of the business wave or 
cycle. Figures 5, 6, 7, and 8 present the deviations, positive 
or negative, of the two series from the various secular trends. 
The last named figures all show a striking correspondence of 
the cyclical fluctuations of the two series. It will be noticed 
that fluctuations of clearings show a tendency to precede or 
forecast the fluctuations in prices. 

Figures 5, 6, 7, and 8 throw some light on Babson's hypothe- 
sis that economic action and reaction are equal, i. e., that 
consecutive areas above and below the line of normal growth 
should be equal for a correct normal line. It is true that the 
sum of areas above and below the lines are roughly equal; that, 
the method of curve fitting accomplishes. But what are con- 
secutive areas, the long-time areas of Figures 6 and 8 or the 
short-time areas of Figures 5 and 7? It is obvious that we 
would get still more heterogeneous results, as regards positive 
and negative areas, if we should use series of various lengths, 
say 20 years or 70 years, or series of monthly or quarterly 
rather than annual data. 



FIGURE 2. 

























SAUERBECK'S I5DEX NUMBERS OF WHOLESALE PRICES. 






/20 








1868-1913, WITH STRAIGHT LINE AND PARABOLA 
FITTED TO DATA. EQUATIONS OF USE AND CURVE: 




















/'0 


p 








(A) y - -.633 t ♦ 76.25 














(B) y « t.051 t z - .581 t + 70.18 










\i 






(Origin at 1891) 






ffO 




















/ "* 


v 1 














A 






* x ^ 


\\/ 




























*^~~^- 












/ . 














■*^Y"~ 


H s 








f \& 


'SB 


70 








































V» — 


yT- 


""""•ij 






an 
























^A 




/* 


70 


/» 


zr 


S* 


&e sa 


as- -» 


S*» S0 


s>s- <# 


ao s& 


Vtr S9 


so 



626 



American Statistical Association. 



[62 



FIGURE 3 




zero 'e^ seeo /e&s 1 /see TSys Tfioo /ffos- s&/a 



FIGURE 4 









































f 


















) 
/° 


^c 


p 














< 


ffii 
















o.. 


' j# 














p-< 


\ _j 


y?4> 










F^o. x 


I < 


— —^°K 


^3 


''°-°-d' 








'p 






e 


'< 


















& 


70 /& 


7S- /& 


<?c? s& 


<s?.r ->*« 


S><? s& 


s>s- ss> 


OO Sf 


«**- •■.*> 


/o 



SAUERBECK'S PRICE INDICES (P) AND LONDON CLEARINGS (C), 1868-1913, WITH 
THEIR RESPECTIVE NINE-YEAR MOVING AVERAGES, 187&-1909. 



63] 



Correlation Method and Curve-Fitting. 



627 



FIGURE 5 

























C a A 






j 


' 






■ 






p/\ 










■ 








h 




f, 




l 
i 


| 








1 \ 




; <> 




1 


i 




\n 






\\ 


1 


' v 


<> 


^ 5 


•7 
















,' / 


i \ ' 


b iT' 




/ M 










iV 


,'/ 


1 \ • 


\ '/ " V* •/ 












<!> 


. r/ 


\ VI 

X i 1 


t f 
V 1/ 




iV 








*\; 


'« 


y[p 


6 o ' 




i/ 


i ! 








\ 


! 


! 


t 1 






<; 














C '4 






•i 








. 












i 












































/& 


TO 


AS 


W /i 


so 


/a 


^J /tf 


SO /& 


w 


/9 


»o /#tf« wo 



*/2 
*/* 

r a 
+ 6 
■f 4 

+ z 



- z 

- # 

-6 

- e 
-/o 
-a 



DEVIATIONS OF LONDON CLEARINGS (C) AND SAUERBECK'S PRICE INDICES (P) 

FROM THEIR RESPECTIVE NINE-YEAR MOVING AVERAGES AS 

SECULAR TRENDS, 187&-1909. 



FIGURE 6 



+30 




















■$ 


+23 




















9 


■+Z0 


















i 


'• 'V 


■f/S 

+ s 




'7 v 














9 


^//a 


















,c/\ ' / 




v" 


i 




Q 








( 


'Tv* 




o 

- S 
~/0 

-/■s 
-zo 
-zs 
-<\0 


c 




v\a 


t *^F% 














v 




V/ 


b\ 


p-t 






1 








, 




T> 




* ^*\ 


X p - 














"1 


<** 


6 *"' 


-^ ' ^ 


f 


















> .<* 


















"0 / 




















<? 










-3S 












































/6 


7" St 


>7f /A 


00 /a 


Af /* 


*W /» 


#*• /J 


v<; /$ 


«f SS 


/W 



DEVIATIONS OF LONDON CLEARINGS (C) AND SAUERBECK'S PRICES (P) FROM 
THEIR RESPECTIVE LINEAR SECULAR TRENDS, 1868-1913. 



628 



American Statistical Association. 



[64 



FIGURE 7 




DEVIATIONS OF LONDON CLEARINGS (C) AND SAUERBECK'S PRICES (P) FROM 
THEIR RESPECTIVE PARABOLIC SECULAR TRENDS, 1868-1913. 



FIGURE 8 

























+ Z3 










































f 


+ /S 




fc-o\ 
















I& 


+/o 


p- 


Ji 


















[ £ 


* s 




1 \ 


/\ 


c 






i 


/. 


At 


« 




1 


°-A/ 




f 


\ 




\ A 1 


\ r 




< 


c 




^ 


\ 


t 


\ 




t 
i 






-/o 










,0-0 j^ 


h\ i 


< •*/ \ 


yV 







-2t> 










^V> 


K1 


~*C 9 


















*© 


o 








-Zfi 


























































































/e 


70 


/*, 


V /6 


so 


/» 


#■& /& 


9* M 


SS s* 


>» s* 


«■ 


s$ 


'/* 



DEVIATIONS OF LONDON CLEARINGS FROM TREND AS COMPOUND INTEREST 
CURVE (C) AND AT SAUERBECK'S PRICES FROM LINEAR TREND (P), 1868-1913. 



65] 



Correlation Method and Curve-Fitting. 



629 



TABLE X. 

COEFFICIENTS OF CORRELATION BETWEEN SAUERBECK'S PRICE INDICES AND 
LONDON CLEARINGS, 1868-1913. 

A. Raw Figures and their Differences. 

B. Deviations from 9-year Moving Average and Differences. 

C. Deviations from Straight Lines and Differences. 

D. Deviations from Parabolas and Differences. 

E. Deviations from Compound Interest Law for Clearings and Straight Line for Prices and 

Differences. 



A. 





r-B- 


r-i- 


n- 


*■+!■ 


T+2- 






-.37 
+ .05 
-.20 
-.27 
-.26 
-.23 
-.21 


-.31 
+ .42 
+ .31 
+ .26 
+ .23 
+ .23 
+ .21 


-.28 
+ .37 
+ .14 
+ .03 
-.01^ 
-.06 
-.08 






+ .12 



























B. 



C. 



D. 





-.27 


+ .19 
-.06 
-.15 
+ .10 


+ .64 
+ .42 
+ .22 
+ .15 


+ .69 
+ .37 
+ .10 
+ .02 


+ .36 





















+ .75 


+ .85 
+ .09 
-.21 
-.26 


+ .92 
+ .55 
+ .35 
+ .31 


+ .90 
+ .43 
+ .08 
-.04 


+ .80 



















Deviations 

First Differences. . 
Second Differences 
Third Differences . 



+ .20 


+ .49 


+ .73 


+ .65 




+ .06 


+ .49 


+ .36 




-.22 


+ .35 


+ .06 




-.28 


+ .31 


-.06 



+ .33 

















+ .67 


+ .75 
+ .02 
-.26 
-.32 


+ .82 
+ .50 
+ .33 
+ .35 


+ .76 

+ .38 

+ .13 

.00 


+ .59 

















The main question upon which we wish to get light is, how- 
ever, the effect of the various methods of eliminating the 
secular trend upon the coefficients of correlation between cor- 
responding deviations. Table X gives the coefficients of 



630 American Statistical Association. [66 

correlation between Sauerbeck's indices and London clearings 
taking the raw figures and their differences, first to sixth, and 
also taking deviations from various secular trends and their 
differences, first to third. Coefficients are presented for con- 
current items and for a lag in both directions, a lag of one for 
differences and a lag of one and of two for deviations or cycles'. 

The coefficients of correlation for the raw figures ( — .37, 
— .31 and —.28) show that the secular trends of prices and 
clearings are in opposite directions. The coefficients for the 
first differences of the raw figures and of all the deviations 
indicate an appreciable positive correlation for concurrent 
items (/„) and for prices one-year lag (r' +1 ) , with the coefficient 
r'„ larger. The coefficients for second and higher differences 
of the raw figures, and deviations as well, decrease as the order 
of difference increases; the coefficients for one-year lag of prices 
decreasing more rapidly than for concurrent items. These 
facts indicate that the maximum .correlation of business cycles 
(including the irregular fluctuations) is for clearings preceding 
prices by less than half a year, say, four months. There is, 
however, an unknown element of spurious correlation between 
clearings and prices because the former are dependent upon 
prices as well as physical volume of goods exchanged and 
speculation. If this spurious element, due not to the method 
but to the nature of the data, were excluded, it is probable 
that the maximum correlation would be found for clearings 
preceding prices by more than six months, perhaps by a year. 

The coefficients of correlation for the deviations all agree in 
locating the maximum, and therefore the lag of prices, at less 
than a year. The actual maximum found was for concurrent 
items, except for deviations from the nine-year averages which 
gives a maximum at one year lag of prices. Since our judg- 
ment is based upon the relative values of the coefficients for 
various degrees of lag, rather than upon their absolute values, 
the type of secular trend chosen does not appear to have great 
significance. Curve-fitting, however, does appear to be prefer- 
able to the taking of moving averages because, first, all the 
items may be used in determining the correlation and, second, 
the coefficients for deviations and first differences disagree in 
their location of the maximum when deviations from the mov- 



67] Correlation Method and Curve-Fitting. 631 

ing average are taken but agree in all other cases. Of course 
it might be possible to use the deviations of all the terminal 
items from the moving average, if values of the latter were 
exterpolated ; fitting a curve, graphically or otherwise, to the 
moving average is the obvious solution of this problem. 

The results appearing in Table X make it clear that the 
coefficients for higher differences give little indication of the 
correspondence of the business cycles in the two series, which 
correspondence is clearly shown by the charts and the correla- 
tion coefficients for deviations from the secular trend and for 
first differences. 



If our interest were in the absolute degree of correlation 
between the cycles of two series for selected pairs of items, 
the nature of the curves used to represent the secular trends 
and the closeness of the fit to the data would be of primary im- 
portance. In case we are dealing with series in which the 
secular trends are non-linear, such as clearings and prices, but 
if, nevertheless, we use straight lines to represent the trends 
and correlate deviations therefrom, the coefficients resulting 
will undoubtedly contain a large spurious element, positive 
or negative. This is illustrated by the discrepancy between 
the coefficients, +.92 and +.73, obtained for deviations from 
straight lines and parabolas, respectively, of London clearings 
and Sauerbeck's prices (Table X). The former coefficient 
(+.92) undoubtedly contains a spurious element amounting 
to some twenty points. The spurious element is positive in 
this case, apparently, because of the downward long time 
trend from 1868 to 1896 and the upward trend from 1897 to 
1913 which results in the pairing of large negative deviations 
for the period 1884-1899, when the deviations are from straight 
fines. 

To test the effect of dividing the data of Sauerbeck's prices 
and London clearings into two homogeneous sub-periods, t. c. 
one of falling prices, 1868-1896, and one of rising prices, 
1897-1913, two linear secular trends were found for each series 
and coefficients of correlation were computed for deviations 
from these trends and for their first differences. The lines 



632 



American Statistical Association. 



[68 



and their equations are given in Figure 9; the deviations ap- 
pear in Figure 10; the coefficients are presented in Table XI. 
The maximum coefficients for the deviations are r = +.71 and 
r +1 = +.70 for the periods 1868-1896 and 1897-1913, respec- 
tively. The maximum coefficients for corresponding first 
differences, r' = +.61 and r' +1 = +.47, are consistent with 
those obtained for the entire period as shown by Table X. 
The coefficients for first differences, r' 0> between deviations 
from various trends (see A, B, C, D, and E of Table X) are 
+.42, +.42, +.55, +.49, and +.50. 



TABLE XI. 

COEFFICIENTS OF CORRELATION BETWEEN DEVIATIONS OF SAUERBECK'S PRICES 
AND OF LONDON CLEARINGS FROM THEIR RESPECTIVE LINEAR SECULAR TRENDS 
FOR THE TWO PERIODS 1868-1886 AND 1897-1913 TOGETHER WITH COEFFICIENTS 
FOR FIRST DIFFERENCES; VARIOUS DEGREES OF LAG OF PRICES (+) AND OF CLEAR- 
INGS (-). 







Coefficients of Correlation. 




Items Paired. 


r-t- 


r - 


r+i- r +t . 


r+s- 


1868-1896. 




+ .38 
+ .20 


+ .71 
+ .61 


+ .63 
+ .34 


+ .34 
+ .16 


-.07 




-.07 






' 




1897-1913. 




-.27 
-.40 


+ .43 

+ .27 


+ .70 

+ .47 


+ .20 
+ .04 


— .34 




— .40 







Division of the data into two sections throws new light on the 
problem. Clearings and prices fluctuated concurrently during 
the first period, but prices lagged behind clearings by a year 
during the period 1896-1913. Perhaps increased speculation 
has changed the character of clearings during the second period. 
Whatever may be the cause, the fluctuations of English prices 
and clearings are shown to be related in the same fashion as are 
those for the United States during the same period (see AD, 
BD, and CD of Table IV). 



69] 



Correlation Method and Curve-Fitting. 



633 



TABLE XII. 

COEFFICIENTS OF CORRELATION BETWEEN RELATIVE WHOLESALE PRICES AND PIG- 
IRON PRODUCTION OF THE UNITED STATES, DEVIATIONS AND FIRST DIFFERENCES 
AS FOLLOWS: 

1879-1913. 

A. Deviations from Linear Secular Trends and First Differences. 

B. Deviations or Prices from Parabola add op Pig-iron Production from Compound Interest 

Curve, and Their First Differences. 

1879-1896. 

C. Deviations from Linear Secular Trends and Their First Differences. 

1887-1913. 

D. Deviations from Linear Secular Trends and Their First Differences. 

1879-1896. 

E. Deviations of Prices from Linear Secular Trend, and of Pio-iron Production from Com- 

pound Interest Curve (Computed for Data 1879-1913). 

1897-1913. 

F. Deviations of Prices from Linear Secular Trend, and of Pio-iron Production from Com- 

pound Interest Curve (Computed for Data 1879-1913). 

1879-1913. 

G. Deviations of Prices from the Two Linear Secular Trends (1879-1896 and 1897-1913) as a 

Continuous Series and of Pio-iron Production from Compound Interest Curve. 
H. First Differences of Raw Items. 
Prices concurrent (0),lag (+), or previous (— ) to pig-iron production as indicated by subscript ofr. 





Period. 


Items Paired. 


Coefficients of Correlation. 


Symbol. 


r-i- 


n- 


*■+!■ 


r+t- 


r +f 


(A) 


1879-1913 


Deviations (a) 
First Differences 


+ .31 


+ .1& 
+ .41 


+ .74 
+ .21 


+ .63 


+ .62 


(B) 


1879-1913 


Deviations 
First Differences 


+ .05 
-.37 


+ .48 
+ .40 


+ .51 

+ .22 






(C) 


1879-1896 


Deviations 
First Differences 




+ .48 
+ .23 


+ .61 
+ .55 


+ .41 
00 


+ .31 


(D) 


1897-1913 


Deviations 
First Differences 


- 65 


+ .41 

+ .54 


+ .35 
+ .17 


-.06 


-.06 


(E) 


1879-1896 


Deviations 




+ .54 


+ .69 


+ .49 


+ .20 


(F) 


1897-1913 


Deviations 




+ .55 


+ .48 


+ .10 


+ 11 


(G) 


1879-1913 


Deviations 




+ .45 


+ .44 


+ .19 


+ .10 


(H) 


1879-1913 


First Differences 


-.31 


+ .41 


+ .22 


-.10 


+ .07 



(a) Coefficient r^_^=+.53. 



634 American Statistical Association. [70 

Table XII presents coefficients of correlation between de- 
viations from various secular trends and their first differences 
of relative wholesale prices and pig-iron production of the 
United States for the period 1879-1913 and the sub-periods, 
1879-1896 and 1897-1913. The various secular trends and 
their equations are given in Figures 11 and 12; the devia- 
tions appear in Figures 13 and 14; the data appear in Table 
XIII. 

For the period 1879-1913 (see A, B, G, and H) the cycles of 
prices and pig-iron production are concurrent. For the period 
1879-1896 (see C and E) the pig-iron cycles precede price 
cycles by a year. For the period 1897-1913 the cycles of the 
two series are strongly concurrent (see D and F). The 
coefficients of correlation for deviations from the linear trends, 
1879-1896, r„=+.48 and r +1 = +.61 and r +2 =+.41, agree in 
locating the maximum at the same point as those for first 
differences, r'<,= +.23 and r' + i = +.55 and r' +2 = .00 (see C). 
The coefficients for deviations from the linear trends, 1897- 
1913, r = +.41 and r +i = +.35 are likewise supported by those 
for first differences, r' = +.54 and r' +1 = +.17 (see D). Using 
deviations of prices from the two linear secular trends as a 
continuous series and of pig-iron production from the com- 
pound interest curve, 1879-1913, we have the coefficients 
r = +.45 and r +i = +.44 (see G). It is obvious that the coef- 
ficients for the whole period' and the two sub-periods are con- 
sistent. At present general prices and pig-iron production 
fluctuate concurrently. 



71] 



Correlation Method and Curve-Fitting. 



635 



FIGURE 9. 




FIGURE 10. 




DEVIATIONS OF LONDON CLEARINGS (C) AND SAUERBECK'S PRICE INDICES (P) 

FROM THEIR RESPECTIVE LINEAR SECULAR TRENDS FOR THE 

TWO PERIODS 1868-1896 AND 1897-1913. 



636 



American Statistical Association. 
FIGURE 11. 



[72 



WHOLESALE PRICE INDICES FOR THE UNITED STATES WITH STRAIGHT 
LINE AND PARABOLA FITTED TO DATA, 1879-1913, AND TWO STRAIGHT 
LINES FITTED TO THE DATA FOR THE PERIODS 1879-1896 AND 1897- 
1913, RESPECTIVELY. EQUATIONS OF LINES AND CURVE: 
1879-1913, (A) y s +0.1475t + 113.9 
1879-1913 (B) y s +0.128t 3 + 0.159t + 103.5 
1879-1886 (C) y = -3.10t + 131.7 




188P 



1B85 



1890 



73] 



Correlation Method and Curve-Fitting. 



637 



FIGURE 13. 



PRICE 
DEVIATIONS 

na 




DEVIATIONS OF UNITED STATES PRICE INDICES (P) FROM PARABOLA AND PIG- 
IRON PRODUCTION (I) FROM COMPOUND INTEREST CURVE, 1879-1913. 



FIGURE 14. 



PRICE 
DEVIATIONS 
-13 




1680 1685 1890 1895 



1900 1905 • • 1910 1913 



DEVIATIONS OF UNITED STATES PRICE INDICES (P) AND PIG-IRON PRODUCTION 

(I) FROM THEIR RESPECTIVE LINEAR SECULAR TRENDS FOR 

THE TWO PERIODS, 1879-1896 AND 1897-1913. 



638 



American Statistical Association. 



[74 



TABLE XIII. 
WHOLESALE PRICE INDICES FOR THE UNITED STATES (COLUMN 1) WITH THE DEVIA. 
TIONS FROM STRAIGHT LINE (2) AND PARABOLA (3) 1879-1913, AND FROM TWO 
STRAIGHT LINES, 1879-1896 AND 1897-1913 (4) AS SECULAR TRENDS. ALSO PIG-IRON 
PRODUCTION IN THE UNITED STATES (5) WITH THE DEVIATIONS FROM STRAIGHT 
LINE (6) AND FROM COMPOUND INTEREST LAW (7) 1879-1913 AND FROM TWO 
STRAIGHT LINES, 1879-1896 AND 1897-1913 (8) AS SECULAR TRENDS. 
(Equations of Secular Trends in Figures 11 and 12.) 



Year. 


l.(a) 


2. 


3. 


4. 


5.(b) 


6.(c) 


7. 


S. 


1879 


118.2 
130.8 
129.3 
132.7 
129.7 
121.6 
113.8 
112.4 
113.3 
115.2 
115.2 
112.9 
111.7 
106.1 
105.6 
96.1 
93.6 
90.4 

89.7 
93.4 
101.7 
110.5 
108.5 
112.9 
113.6 
113.0 
115.9 
122.5 
129.5 
122.8 
126.5 
131.6 
129.2 
133.6 
135.2 


+ 5.3 
+17.8 
+16.1 
+19.4 
+16.2 
+ 8.0 
.0 

- 1.5 

- .8 
+ 1.0 
+ .8 

- 1.6 

- 3 JO 

- 8.7 

- 9.4 
-19.0 
-21.7 
-25.0 

-25.9 
-22.3 
-14.1 

- 5.5 

- 7.6 

- 3.4 

- 2.8 

- 3.6 

- .8 
+ 5.6 
+12.5 
+ 5.6 
+ 9.2 
+14.1 
+11.6 
+15.8 
+17.3 


-18.5 

- 1.8 
+ 0.4 
+ 7.4 
+ 7.7 
+ 2.6 

- 2.5 

- 1.3 
+ 1.9 
+ 5.8 
+ 7.6 
+ 6.8 
+ 6.8 
+ 2.2 
+ 2.4 

- 6.6 

- 8.9 
-12.1 

- 13.0 

- 9.9 

- 2.4 
+ 5.3 
+ 2.0 
+ 4.8 
+ 3.7 
+ 11 
+ 1.6 
+ 5.6 
+ 9.7 
+ 0.1 
+ 0.4 
+ 1.8 

- 4.5 

- 4.1 

- 6.9 


-13.5 
+ 1.2 
+ 1.8 
+ 7.3 
+ 6.4 
+ 0.4 

- 5.3 

- 4.6 

- 1.6 
+ 2 4 
+ 4.5 
+ 4.3 
+ 5.2 
+ 17 
+ 3.3 

- 4.1 

- 4.5 

- 5.6 

- 6.5 

- 5.5 
+ 0.2 
+ 6.4 
+ 1.8 
+ 3.6 
+ 1.7 

- 1.5 

- 1.2 
+ 2.8 
+ 7.2 

- 2.1 

- 1.0 

- 1.5 

- 3.5 

- 1.7 

- 2.6 


27 
38 
41 
46 
46 
41 
40 
57 
64 
65 
76 
92 
83 
92 
71 
67 
94 
86 

97 
118 
136 
138 
159 
178 
180 
165 
230 
253 
258 
159 
258 
273 
236 
297 
310 


+30 
+33 
+28 
+25 
+18 
+ 5 

- 4 
+ 5 
+ 4 

- 3 


+ 9 

- 8 

- 7 
-36 
-48 
-29 
-44 

-41 

-28 
-18 
-24 
-11 


- 5 
-28 
+29 
+44 
+41 
-66 
+25 
+33 
-12 
+41 
+46 


- 7 
+ 2 
+ 2 
+ 5 
+ 2 

- 6 
-10 
+ 3 
+ 7 
+ 4 
+11 
+22 
+ 8 
+12 
-14 
-24 

- 4 
-18 

-14 

- 1 
+ 9 
+ 2 
+14 
+23 
+14 
-12 
+41 
+51 
+42 
-72 
+11 
+ 9 
-46 

- 4 
-12 


5 


1880 


+ 2 
+ 2 
+ 3 




1881 


1882 


1883 


1884 


— 9 


1885 


-14 


1886 





1887 


+ 3 


1888 


+ 1 


1889 


+ 8 


1890 


+20 


1891 


+ 8 


1892 


+13 


1893 


-11 


1894 


—19 


1895 


+ 4 


1896 


- 7 


1897 


-10 


1898 


— 1 


1899 


+ 5 


1900 


- 5 


1901 


+ 4 


1902 


+11 


1903 


+ 1 


1904'. 


-26 


1905 


+27 


1906 


+38 


1907 


+32 


1908 


-79 


1909 


+ 8 


1910.. 


+11 


1911 


—38 


1912 


+11 


1913 


+12 



(a) The Aldrich and Bureau of Labor Statistics indices are reduced to a continuous series with the 
base 1890-1899. 

(b) From Statistical Abstract of the United States, 1914, p. 664. The units here are 100,000 long tons. 

(c) Equation of line of secular trend, y=7.852t— 3.00, origin at 1879. This is the only trend having 
any negative ordinate for the period studied. 



The conclusion just stated is of especial interest because it 
is in conflict with that of Professor H. L. Moore.* In his 
Economic Cycles, he found the following coefficients between 
the cycles of crop yield and pig-iron production, using three 
year averages in all cases: r„ = .625; r- + i = .719; r +2 = .718; r +3 = 
.697; r +4 = .572 (see Table XIV). Correlating the cycles of 
crop yield with cycles of general prices, f he obtains the coeffi- 

* Moore, H. L. Economic Cycles, p. 110. 
t Hid., p. 122. 



75] Correlation Method and Curve-Fitting. 639 

cients r +3 = .786, r +i = .800, and r +5 = .710. He concludes from 
these coefficients, first, "that the cycles in the yield per acre 
of the crops are intimately related to the cycles in the activity 
of industry, and that it takes between one and two years for 
good and bad crops to produce the maximum effect upon the 
activity of the pig-iron industry" and, second, that "the 

cycles in the yield per acre of the crops are 

intimately connected with the cycles of general prices, and the 
lag in the cycles of general prices in approximately four years."* 
It seems to me that this conclusion is not warranted because of 
the poor fit of a linear secular trend to pig-iron production. 
The ordinate of the secular trend is negative for the years 1871, 
1872, and 1873. The deviations from the secular trend are all 
positive for the periods 1871-1877 and 1902-1910 and all 
negative for the period 1878-1901. The deviations of crop 
yield are, with few exceptions, positive from 1871 to 1879 and 
1903 and 1910 and negative from 1880 to 1902.f It appears 
probable, then, that the correlation coefficients upon which he 
relies contain a large spurious element. At any rate the 
differences between the coefficients, amounting to less than .02 
in most cases, on which his judgment is based, cannot be con- 
sidered significant. 

Waiving the question of Moore's use of three-year progres- 
sive averages to form the series for which correlation coeffi- 
cients are computed, which usage throws serious doubt upon 
the reliability of his conclusions, I will test the correlation 
between the series of three-year averages by computing the 
coefficients between first difference of those items. The 
coefficients are given in Table XIV. The coefficients between 
(1) crop-yield and pig-iron production- and between (2) crop- 
yield and general prices are not significant. The former group 
coefficients (1) shows a curious alternation in value which, 
examination of the basic series demonstrates, is due to a few 
predominating items in pig-iron production after 1905. The 
latter group of coefficients (2) shows a maximum at four-years 
lag of prices but the coefficient (r' +i = +.39) is not much larger 
than the maximum coefficient (r'_ 2 =+.32) found for first 
differences of the two random series previously analyzed (see 

* Moore, H. L. Economic Cycles, pp. 110, 122. 
t Ibid., p. 131. 



640 



American Statistical Association. 



[76 



Table V). The coefficients for pig-iron production and general 
prices (3) reach a maximum (r' +1 = +.59) at one year lag of 
prices. This coefficient is probably significant. It gives, 
however, a result at variance with Moore's conclusions but 
consistent with the conclusions obtained when the period 
1879-1913 was divided into two sub-periods (see Table XIII). 
Moore's use of three-year averages probably results in much 
higher coefficients than would result from annual figures. 
Even so, the coefficients between first differences are small for 
crops and the indices of the industrial and business activity. 
Moore's object was to show that the cycles of business reflect 
the cycles in crops, not merely, having assumed that cycles 
•exist, to find the lag. For this object a good "fit" of secular 
trend to data is imperative. The method of first differences, 
then, is valuable because it reveals spurious correlation between 
deviations from secular trends when the fit is not good. 

TABLE XIV. 

<A) COEFFICIENTS OF CORRELATION BETWEEN DEVIATIONS OF YIELD PER ACRE OF 
NINE CROPS FROM THE LINEAR SECULAR TREND AND SIMILAR DEVIATIONS OF 
PIG-IRON PRODUCTION AND OF GENERAL PRICES, 1870-1911; ITEMS OF THE VARIOUS 
SERIES ARE THREE-YEAR PROGRESSIVE AVERAGES. 

•(B) COEFFICIENTS OF CORRELATION BETWEEN FIRST DIFFERENCES OF THE RE- 

SPECTIVE DEVIATIONS. 

The subscript i in r; indicates the lag in prices and pig-iron production compared with crops, or of prices at 

compared with pig-iron. 

(A) Moore's Coefficients.- (s) 



Crop Yield Correlated with: 


r - 


r +f 


r+$. 


r+i. 


r+i- 


r+i- 




.625 


.719 


.718 


.697 
.786 


.572 
.800 






.710 






(B) Coefficients between First Differences. 


Crop Yield Correlated with: 


rV 


r'+i- 


r'+*. 


r'+s- 


r'+4- 


r'+ s . 




-.02 
-.11 


+ .26 
-.08 


+ .06 
+ .21 


+ .25 
+ .32 


-.12 
+ .39 






+ .27 




Pig-Iron Production Correlated with: 


/-;• 


rV 


'+,. 


»•'+*• 


r'+i. 


'+*■ 




+ .25 


+ .51 


+ .59 


+ .44 


+ .25 









(a) Moore, H. L. Economic Cycles, pp. 110-122. Series on pp. 131, 134. 



77] Correlation Mdhod and Curve-Fitting. 641 

Re'sume'. 

The variate difference correlation method has been in- 
vented to eliminate spurious correlation due to position of 
items in time or space. 

The method involves the assumption that the taking of 
multiple differences leads to series of random variates. In 
practice for short series this assumption is not fulfilled. 

Coefficients for higher differences of short series tend to al- 
ternate in sign and to conceal rather than to reveal the nature 
of the correlation between the series being tested. 

Stability of coefficients for higher differences appears to 
have little significance for short series, and perhaps for long 
series as well. The assumption that the series correlated are 
made up of variates "randomly distributed in time," if ful- 
filled, will lead to stable coefficients for successive differences. 
However, though this condition is sufficient for stability it is 
not necessary. 

In testing economic series for correspondence of their 
cyclical fluctuations, especially in determining the relative 
position of the cycles upon the assumption that there are 
cycles, the correlation coefficients between deviations from a 
linear secular trend together with coefficients for first differ- 
ences constitute a reliable basis for judgment. 

When the question is one of the existence or non-existence 
of similar cycles in two time series great care must be used in 
the choice of the function used to represent the secular trend 
and in the nature of the fit of the curve or line to the data. The 
method of first differences is an extremely valuable aid in 
investigating such a question. 

Coefficients of correlation between second differences may 
give information concerning minor oscillations as distinct from 
secular trend and major cycles. Even for this purpose the 
use of higher than second differences appears to be unreliable, 
especially so for short series. The coefficients of correlation 
between second differences are identical with those between 
deviations from three-year progressive averages. 

The method of measuring correlation between cycles of time 
series, that is both easy of application and reliable, is the 
method of first differences. In general, however, this method 



642 American Statistical Association. [78 

should be supplemented by curve fitting. To secure a picture 
of the cycles, it is, of course, necessary to take deviations from 
a closely fitted curve. 

Finally, curve fitting to eliminate the secular trend of a time 
series should always be adapted to the problem in hand and 
interpretation of coefficients of correlation between time 
series should be made with continual reference to the funda- 
mental data. Important light may be secured by dividing 
statistical series into more homogeneous sub-series and analyz- 
ing the latter. The nature of the data is as important as 
the method to be applied. Rules-of-thumb concerning 
method or data are apt to lead to pitfalls. 



